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Abstract 

We consider energy-efficient scheduling on multiprocessors, where the speed of each 
processor can be individually scaled, and a processor consumes power s a if it runs at 
speed s, where a > 1. A scheduling algorithm needs to decide both processor allocations 
and speeds for a set of parallel jobs whose parallelism can vary with time. The objective 
is to minimize the sum of overall energy consumption and some performance metric, 
which in this paper includes flow time and makespan. For both objectives, we present 
semi-clairvoyant algorithms that are aware of the instantaneous parallelism of the jobs 
but not their future information. We present U-Ceq algorithm for flow time plus 
energy, and show that it is 0(l)-compctitivc. This is the first 0(l)-competitive result 
for multiprocessor speed scaling on parallel jobs. We also consider, for the first time in 
the literature, makespan plus energy. We present P-First algorithm and show that it 
is 0(ln 1-1 /" P)-competitive for parallel jobs consisting of fully-parallel and sequential 
phases, where P is the total number of processors. Moreover, we prove that P-First 
is asymptotically optimal in this setting by providing a matching lower bound. In 
addition, we revisit non-clairvoyant scheduling for flow time plus energy, and show that 
N-Equi algorithm is (9(ln P)-competitive. We then prove a lower bound of fi(ln 1/Q P) 
for any non-clairvoyant algorithm. 

1 Introduction 

Energy has been widely recognized as a key consideration in the design of modern high- 
performance computer systems. One popular approach to control energy is by dynamically 
changing the speeds of the processors, a technique known as dynamic speed scaling [29, 8, 16]. 
It has been observed that for most CMOS-based processors, the dynamic power consumption 
satisfies the cube-root rule, that is, the power of a processor is proportional to s 3 when it 
runs at speed s [SJ [23]. Since the seminal paper by Yao, Demers and Shenker [3T], most 
researchers, however, have assumed a more general power function s a , where a > 1 is the 
power parameter. As this power function is strictly convex, the total energy usage when 
executing a job can be significantly reduced by slowing down the speed of the processor 
at the expense of the job's performance. Thus, how to optimally tradeoff the conflicting 
objectives of energy and performance has become an active research topic in the algorithmic 
community. (See \19\ [T] for two excellent surveys of the field.) 
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In this paper, we focus on scheduling parallel jobs on multiprocessors with per-processor 
speed scaling capability [18j EQ) that is, the speed of each processor can be individually 
scaled. A scheduling algorithm needs to have both a processor allocation policy, which 
decides the number of processors allocated to each job, and a speed scaling policy, which 
decides the speed of each allocated processor. Moreover, we assume that the parallel jobs 
under consideration can have time- varying parallelism. Thus, if the scheduling algorithm 
is not designed properly, it may incur large amount of energy waste when the jobs have 
relatively low parallelism, or cause severe execution delay and hence performance degrada- 
tions when the parallelism of the jobs is high. This poses additional challenges to the speed 
scaling problem for parallel jobs compared with its sequential counterpart. 

We adopt the objective function proposed by Albers and Fujiwara [2] that consists of a 
linear combination of overall energy consumption and some performance metric, which in 
this paper includes total flow time and makespan. The flow time of a job is defined to be 
the duration between its release and completion. The total flow time for a set of jobs is the 
sum of flow time of all jobs, and makespan is the completion time of the last completed job 
in the job set. Both total flow time and makespan are widely used performance metrics in 
scheduling literature: the former often measures the average response time of all users in the 
system while the latter is closely related to the throughput of the system. Although energy 
and flow time (or makespan) have different units, optimizing a linear combination of the 
two can be naturally interpreted by looking at both objectives from a unified point of view. 
Suppose that the user is willing to spend one unit of energy in order to reduce p units of total 
flow time (or makespan). Then, by changing the units of time and energy, we can assume 
without loss of generality that p = 1. Thus, the objective can be reduced to optimizing 
the total flow time (or makespan) plus energy for a set of jobs. In fact, minimizing sum 
of conflicting objectives is quite common in many bi-criteria optimization problems. In the 
scheduling literature, similar metrics have been considered previously that combine both 
performance and cost of scheduling as part of the objective functions [28} l26j 111]. 

Since Albers and Fujiwara [2] first proposed total flow time plus energy, many excellent 
results (see, e.g.,[4][22l[3j[9j[ini[274[30]) are obtained under different scheduling models. For 
instance, some results assume that the scheduling algorithm is clairvoyant, that is, it gains 
complete knowledge of a job, such as its total work, immediately upon the job's arrival; 
the other results are based on a more practical non-clairvoyant model, where the scheduler 
knows nothing about the un-executed portion of a job. Most of these results, however, 
are applicable to scheduling sequential jobs on a single processor, and to the best of our 
knowledge, no previous work is known that minimizes makespan plus energy. The closest 
results to ours are from Chan, Edmonds and Pruhs [TO], and Sun, Cao and Hsu [27], who 
studied non-clairvoyant scheduling for parallel jobs on multiprocessors to minimize total 
flow time plus energy. In both work, it is observed that any non-clairvoyant algorithm 
that allocates one set of uniform-speed processors to a job performs poorly, or specifically 
Q^p{a-i)/a ). competitive, where P is the total number of processors. The intuition is 
that any non-clairvoyant algorithm may in the worst case allocate a "wrong" number of 
processors to a job compared to its parallelism, thus either incur excessive energy waste or 
cause severe execution delay. 

Therefore, to obtain reasonable results, a non-clairvoyant algorithm need to be more 
flexible in assigning processors of different speeds to a job. To this end, Chan, Edmonds 
and Pruhs |10| assumed an execution model, in which each job can be executed by multiple 
groups of different speed processors. The execution rate of a job at any time is given by the 
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fastest rate of all groups. They proposed a scheduling algorithm MultiLaps and showed 
that it is 0(log P)-competitive with respect to the total flow time plus energy. In addition, 
they also gave a lower bound of ^(log 1 / P) for any non-clairvoyant algorithm. Sun, Cao 
and Hsu [27J, on the other hand, assumed a different execution model, in which only one 
group of processors with different speeds are allocated to each job at any time, and the 
execution rate is determined by the speeds of the fastest processors that can be effectively 
utilized. They proposed algorithm N-Equi and showed that it is 0(ln 1//a P)-competitive 
with respect to the total flow time plus energy on batched parallel jobs (i.e., all jobs are 
released at the same time) . Both execution models are based on certain assumptions and can 
be justified in their respective terms. It is, however, quite difficult to predict which model 
is more practical to implement. In this paper, we first revisit non-clairvoyant scheduling for 
total flow time plus energy under the model by Sun, Cao and Hsu, and show that: 

• N-Equi is 0(lnP)-competitive with respect to total flow time plus energy for parallel 
jobs with arbitrary release time, and any non-clairvoyant algorithm is f2(ln 1 / a P)- 
competitive. Interestingly, both results match asymptotically those obtained under 
the execution model by Chan, Edmonds and Pruhs. The lower bound also suggests 
that N-Equi is asymptotically optimal in the batched setting. 

Moreover in this paper, we consider a new scheduling model, which we call semi- 
clairvoyant model. Compared to the non-clairvoyant model, which does not allow a sched- 
uler to have any knowledge about the un-executed portion of a job, we allow a semi- 
clairvoyant algorithm to know the available parallelism of a job at the immediate next 
step, or the instantaneous parallelism. Any future characteristic of the job, such as its 
remaining parallelism and work, is still unknown. In many parallel systems using central- 
ized task queue or thread pool, instantaneous parallelism is simply the number of ready 
tasks in the queue or the number of ready threads in the pool, which is information practi- 
cally available to the scheduler. Even for parallel systems using distributed scheduling such 
as work-stealing [6j, instantaneous parallelism can also be collected or estimated through 
counting or sampling without introducing much system overhead. We first show that such 
semi-clairvoyance about the instantaneous parallelism of the jobs can bring significant per- 
formance improvement with respect to the total flow time plus energy. In particular, 

• We present a semi-clairvoyant algorithm U-Ceq, and show that it is 0(l)-competitive 
with respect to total flow time plus energy. This is the first 0(l)-competitive result 
on multiprocessor speed scaling for parallel jobs. 

Comparing to the performance of non-clairvoyant algorithms, the reason for the im- 
provement is that upon knowing the instantaneous parallelism a semi-clairvoyant algorithm 
can now allocate a "right" number of processors to a job at any time, thus ensures that 
no energy will be wasted. At the same time, it can also guarantee sufficient execution rate 
by setting the total power consumption proportionally to the number of active jobs at any 
time and equally dividing it among the active jobs. This has been a common practice that 
intuitively provides the optimal balance between energy and total flow time [U [22j O [9]. 
Moreover, unlike the best non-clairvoyant algorithm known so far \10\ 127]. which requires 
nonuniform speed scaling for an individual job, our semi-clairvoyant algorithm only requires 
allocating processors of uniform speed to a job, thus may have better feasibility in practice. 

We also consider, for the first time in the literature, the objective of makespan plus 
energy. Unlike total flow time plus energy, where the completion time of each job contributes 
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to the overall objective function, makespan is the completion time of the last job, and the 
other jobs only contribute to the energy consumption part of the objective, hence can be 
slowed down to improve the overall performance. However, without knowing the future 
information, such as the remaining work of the jobs, we show that it is harder to minimize 
makespan plus energy even in the semi-clairvoyant setting. Specifically, 

• We present a semi-clairvoyant algorithm P-First and show that it is 0(ln 1_1 / a P)- 
competitive with respect to makespan plus energy for batched parallel jobs consist- 
ing of sequential and fully-parallel phases. We also give a matching lower bound of 
n(l n i-iA*P) for any semi-clairvoyant algorithm. 

In addition, compared to minimizing total flow time plus energy, where the common 
practice is to set the power proportionally to the number of active jobs, we show that the 
optimal strategy for minimizing makespan plus energy is to set the power consumption at 
a constant level, or more precisely at any time, where a is the power parameter. 

The rest this paper is organized as follows. Section [2] formally defines the models and the 
objective functions. Section [3] studies both non-clairvoyant and semi-clairvoyant schedul- 
ing for total flow time plus energy. Section [5] presents our semi-clairvoyant algorithm for 
makespan plus energy. Finally, Section [5] provides some discussions and future directions. 

2 Models and Objective Functions 

We model parallel jobs using time-varying parallelism profiles. Specifically, we consider a 
set J = {Ji, J2, • • • , J n } of n jobs to be scheduled on P processors. Adopting the notions 
in [HI HU [H2 [10], each job Jj E J contains k{ phases ( J/, Jf , • • • , jf l ), and each phase jf 
has an amount of work wf and a linear speedup function Y\ up to a certain parallelism 
hf, where hf > 1. Suppose that at any time t, job Jj is in its k-th phase and is allocated 
cij(i) processors, which may not have the same speed. We assume that the execution of the 
job at time t is then based on the maximum utilization policy |20[ [5], which always utilizes 
faster processors before slower ones until the total number of utilized processors exceeds 
the parallelism of the job. In particular, let Sj denote the speed of the j-th allocated 
processor, and we can assume without loss of generality that s\ > S2 > • • • > s a j t \. Then, 
only oli(t) = min{oj(t), hf} fastest processors are effectively utilized, and the speedup or the 
execution rate of the job at time t is given by Ff(ai(t)) = YlT=i s j- The span if of phase Jf, 
which is a convenient parameter representing the time to execute the phase with hf or more 
processors of unit speed, is then given by if = wf /hf. We say that phase is fully-parallel 
if hf = 00 and it is sequential if hf = 1. Moreover, if job Jj consists of only sequential and 
fully-parallel phases, we call it (Par-Seq)* job [25]. Finally, for each job Jj, we define its 
total work to be w{Ji) = Y2k=l w i an< ^ define its total span to be /(Jj) = Ylk=l 

At any time t, a scheduling algorithm needs to specify the number Oj(i) of processors 
allocated to each job Jj, as well as the speed of each allocated processor. We say that an 
algorithm is non- clairvoyant if it makes all scheduling decisions without any current and 
future information of the jobs, such as their release time, parallelism profile and remaining 
work. In addition, we say that an algorithm is semi- clairvoyant if it is only aware of the 
current parallelism or instantaneous parallelism of the jobs, but not their future parallelism 
and remaining work. We require that the total processor allocations cannot be more than 
the total number of processors at any time in a valid schedule, i.e., ^r=i a «W — P- -^ e ^ 
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rj denote the release time of job Jj. If all jobs are released together in a single batch, then 
their release time can be assumed to be all 0. Otherwise, we can assume without loss of 
generality that the first released job arrives at time 0. Let q denote the completion time of 
job Jj. We also require that a valid schedule must complete all jobs in finite amount of time 
and cannot begin to execute a phase of a job unless it has completed all its preceding phases, 

k 

i.e., ri = c° < c\ < ■ ■ ■ < c { % = C{ < oo, and f Tf(ai(t))dt = wf for all 1 < k < hi, where 

c\ denotes the completion time of phase Jf. 

The flow time fi of any job Jj is the duration between its completion and release, i.e., 
fi = Ci — Tj. The total flow time F(j) of all jobs in J is given by F{j) = Y^=ifii 
and the makespan M{J) is the completion time of the last completed job, i.e., M{J) = 
maxj = i ... n Q. Job Jj is said to be active at time t if it is released but not completed at 
t, i.e., ri < t < ci. An alternative expression for the total flow time is F(J') = ntdt, 
where n t is the number of active jobs at time t. For each processor at a particular time, its 
power is given by s a if it runs at speed s, where a > 1 is the power parameter. Hence, if a 
processor is not allocated to any job, we can set its speed to 0, so it does not consume any 
power. Let Ui(t) denote the power consumed by job Jj at time t, i.e., ui 

(t) = £"=? sf. The 

overall energy consumption e« of the job is given by = Ui(t)dt, and the total energy 
consumption E{J) of the job set is E(J) = £™ =1 ej, or alternatively E{J) = J utdt, 
where ut = X^=i n «(^) denotes the total power consumption of all jobs at time t. In this 
paper, we consider total flow time plus energy G(J) and makespan plus energy H(J) of 
the job set, i.e., G{J) = F{J) + E(J) and H{J) = M{J) + E(J). The objective is to 
minimize either G(J~) or H(J'). 

We use competitive analysis [7] to evaluate an online scheduling algorithm by comparing 
its performance with that of an optimal offline scheduler. An online algorithm A is said 
to be c\- competitive with respect to total flow time plus energy if Ga{J) < c\ ■ G*{J) for 
any job set J, where G*{J) denotes the total flow time plus energy of J under an optimal 
offline scheduler. Similarly, an online algorithm B is said to be C2- competitive with respect 
to makespan plus energy if for any job set J we have Hb{J) < ci ■ H*{J), where H*(J) 
denotes the makespan plus energy of the job set under an optimal offline scheduler. 

3 Total Flow Time Plus Energy 

We consider the objective of total flow time plus energy in this section. We first revisit 
the non-clairvoyant algorithm N-EQUI [27] by showing its competitive ratio for arbitrary 
released jobs. We then derive a lower bound on the competitive ratio of any non-clairvoyant 
algorithm. Finally, we present a semi-clairvoyant algorithm U-Ceq and show that it sig- 
nificantly improves upon any non-clairvoyant algorithm. 

3.1 Preliminaries 

We first derive a lower bound on the total flow time plus energy of any scheduler, which 
will help us conveniently bound the performance of the online algorithms through indirect 
comparison instead of comparing directly with the optimal. 

Lemma 1 The total flow time plus energy of any set J of n jobs under the optimal sched- 
uler satisfies G*(J) > G\{j) = Zti £ti TJJW^- 
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Proof. Consider any phase Jf of job Jj. The optimal scheduler will only perform better 
if there is an unlimited number of processors at its disposal. In this case, it will allocate 
a processors of the same speed, say s, to the phase throughout its execution, since by the 
convexity of the power function, if different speeds are used, then averaging the speeds will 
result in the same execution rate but less energy consumed |31j . Moreover, we have a < hf, 
since allocating more processors to a phase than its parallelism will incur more energy 
without improving flow time. The flow time plus energy introduced by the execution of Jf 

is then given by g + ■ as- = W \ (i + s^) > {a _^_ 1/a ■ > {a _^_ 1/a ■ 

Extending this property over all phases and all jobs gives the lower bound. □ 

We now outline the amortized local competitiveness argument [3] to prove the competitive 
ratio of any online scheduling algorithm A. We first define some notations. For any job set 

dt 

iG* 

dt 



J at time t, let dGA d f^ denote the rate of change for flow time plus energy under online 
algorithm A, and let dG ^ denote the rate of change for flow time plus energy under 



the optimal. Apparently, we have ^A^W) = nt _|_ Uu anc i dG w) — n * _|_ n * ; where 
n\ and u\ denote the number of active jobs and the power under the optimal at time t. 
Moreover, we let — 1 4r^ denote the rate of change for the lower bound given in Lemma [JJ 
with respect to the execution of the job set under A at time t. We also need to define a 
potential function $(t) associated with the status of the job set at any time t under both 
the online algorithm and the optimal. Then, we can similarly define to be the rate 
of change for the potential function at t. The following lemma shows that the competitive 
ratio of algorithm A can be obtained by bounding the instantaneous performance of A at 
any time t with respect to the optimal scheduler through these rates of change. 

Lemma 2 Suppose that an online algorithm A schedules a set J of jobs. Then A is 
(ci + ci)- competitive with respect to total flow time plus energy, if given a potential function 
the execution of the job set under A satisfies 

- Boundary condition: <£(()) < and <J>(oo) > 0; 

- Arrival condition: does not increase when a new job arrives; 

- Completion condition: &(t) does not increase when a job completes under either A or 
the optimal offline scheduler; 

- Running condition: + «*i < Cl . + c , . W» . 

Proof. Let T denote the set of time instances when a job arrives or completes under either 
the online algorithm A or the optimal offline scheduler. Integrating the running condition 
over time, we get G A {J) + $(oo) - $(0) + £ teT ($(t~) - *(t + )) < c x ■ G*{j) + c 2 • 
where t~ and t + denote the time instances right before and after time t. Now, applying 
boundary, arrival and completion conditions to the above inequality, we get Ga{J) < 
ci • G*{J) + C2 • G\{j). Since G\{j) is a lower bound on the total flow time plus energy of 
job set J according to Lemma [Tj the performance of algorithm A thus satisfies Ga{J) < 
(ci + c 2 )-G*(J). □ 



3.2 Non-clairvoyant Algorithm: N-EQUI 

In this subsection, we revisit the non-clairvoyant algorithm N-Equi (Nonuniform Equipar- 
tition) [27], which is described in Algorithm [TJ The idea of N-Equi is that at any time it 
allocates an equal share P/nt of processors to each active job, and the speeds of the allo- 
cated processors are set monotonically decreasing according to a scaled version of harmonic 
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series. We assume that the processor allocation P/nt is always an integer, otherwise by 
rounding it to [P/nt], the bounds derived will increase by at most a constant factor. 



Algorithm 1 N-Equi (at any time t) 



1: allocate a,(i) = P/n t processors to each active job Jj, 

/ i \ 1 / a 

2: set the speed of the j-th allocated processor to job Jj as Sjj(i) = ( ( a -i)H P -j ) > wri ere 
1 < j < and Hp = l + ^ + -- - + -pis P-th harmonic number. 

At time t, when job Jj is in its fc-th phase, we say that it is satisfied if its processor 
allocation is at least the instantaneous parallelism, i.e., cii(t) > h\. Otherwise, the job is 
deprived if di(t) < h\. Let Js{t) and J7dW denote the sets of satisfied and deprived jobs 
at time t, respectively. For convenience, we let nf = |j7s(t)| and nf = |J7d(0I- Since a 
job is either satisfied or deprived, we have nt = nf + . Moreover, we define xt = jm 
to be the deprived ratio. Let cij(i) = min{ai(t) , h^} . By approximating summations with 

integrals, the execution rate of job J. t can be shown to satisfy ( ( a _i)g p ) 2 1 /" — — 

r^'(oj(t)) < ^ ( Q _i)ff p ^) "'i-i/q a ^ ti me Moreover, the power consumption of job Jj 
satisfies Ui(t) < r^j, and hence the overall power consumption satisfies ut < 

To bound the performance of N-Equi, we adopt the potential function by Lam et al. [22] 
in the analysis of online speed scaling algorithm for sequential jobs. Specifically, we define 
nt(z) to be the number of active jobs whose remaining work is at least z at time t under 
N-Equi, and define n^(z) to be the number of active jobs whose remaining work is at least 
z under the optimal. The potential function is defined to be 



$(t) = n / 
J 



'nt{z) 

J^i l - 1/a 1 -n t {z) v - 



=1 



dz, (1) 



A/a 



where r\ = r( pl 1 1 / a and rj' is a constant to be specified later. With the help of Lemma [21 
the competitive ratio of N-Equi is proved in the following theorem. 

Theorem 3 N-Equi is O (In P)- competitive with respect to the total flow time plus energy 
for any set of parallel jobs, where P is the total number of processors. 

Proof. We will show that the execution of any job set under N-Equi (NE for short) 

satisfies the boundary, arrival and completion conditions, as well as the running condition 

dG NE (J(t)) . d<I>(t) dG*(J*(t)) . dGUjit)) , ~ n m , ~ n i/q D x 

— ^ y " H — -^ J - < c\ ■ v " + C2 ■ — j t , where c\ = O(lnP) and C2 = 0(ln 1 P). 

Then the theorem is directly implied. 

- Boundary condition: at time 0, no jobs exist, so nt(z) and n^(z) are for all z. Hence, 
<3?(0) = 0. At time 00, all jobs are completed, so again <£(oo) = 0. 

- Arrival condition: Let t~ and t + denote the instances right before and after a new 
job with work w arrives at time t. Hence, we have n t +{z) = n t ~{z) + 1 for z < w and 
n t + (z) = n t - (z) for z > w, and similarly n* + (z) = n*_ (z) + 1 for z < w and n* + (z) = n*_ (z) 

for z > w. For convenience, we define 4>t{z) = — nt{z) l ~ l l a ril(z). It 

is obvious that for z > w, we have 4>t+( z ) = 4>t~{ z )- F° r z < w, we can get 4>t+{ z ) — 
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t _(V) = n *_( z ) (n t -{z) l - l / a - (n t -(z) + \) l ~ l / a ) < 0. Hence, $(t+) = i\ / °° <t> t + (z)dz < 
r,^^{z)dz = ^(t-). 

- Completion condition: when a job completes under either N-Equi or the optimal, 
is unchanged since n(t) or n*(t) is unchanged for all z > 0. 

- Running condition: At any time t, suppose that the optimal offline scheduler sets the 
speed of the j-th processor to s*. We have dGNE j^^ = nt + u t < -£z\ n t and dG ^ ^ = 

n| + Uf = n* + Ylf=i • To bound the rate of change dGl j[ ^ , we consider each 

satisfied job Jj G Js(t)- Suppose that at time t, Ji is in its k-th phase under N-Equi, then 

k ( 1 \ 1 / a (h k ) 1 ~ 1/a dG*(J(t)) 

the execution rate of the job is given by Tf(ai(t)) > I ( a _^ Hp J 2 1 / a — ' Since — dt 
only depends on the parts of the jobs that are executed by N-Equi at time t, we have 

dGUJjt)) a r?( a< (t)) a ( l V /Q n 5_ a ( 1 \ 1/a _.n„ 

c2i - (a-1) 1 " 1 /" Z^JieJs(t) / A Mi-V» - a-1 ^2^ P y * ~ a-1 ^2/fp ^ ^ x tJ"-t- 

Now, we focus on finding an upper bound on the rate of change —jp- for the potential 
function $>(t) at time t. In particular, we consider the set Jo{t) of deprived jobs. In the 
worst case, the nf deprived jobs may have the most remaining work. Again, we assume 
that at time t job Jj € Joit) is i n its k-th phase under N-Equi. The change of the potential 
function can then be bounded by 



dt 



< — 





dz 



dz 



+j t J [nt{z) l - l/a (n* t (z) - n* t+dt (z)) + n* t {z) {n t {z) l ~ l l a - n^) 1 " 1 ^ 
iTjl/a ( n t P nt 

* y^k ( - E * 1-1/a • tfw*)) + E -3 + < E - (< - ^^'i r ^)) ) • 



t=i i=i 



l-DI 2 " 1 /" 2 2-1/c 



We can get ^=1 > J? * 1-1/a * = 1 2-1/a ^ ^ and E£i ^ 1/a ~ (» " 1) 1_1 ^ Q ) 



2-1/0 

l l Moreover, according to LemmalU we have X^f=i g j — x ^ Hp P J Sf=i ( s f ) + 



1-1/" / \ a 



n t . ~ , ~ ^ -.. t ^j=i"j— a ^J=J- ^"J 

—77 — \ — -rr^Pnt-, where A is a constant to be specified later. Substituting these bounds 
as well as the upper and lower bounds of T^(ai(t)) into and simplify, we have 

^ < 7/ f X ' xx, + ^ V (VT + 1-1/0 xx, + xxA (2 ) 

Now, we set 77' = an d ^ = 4 a ™ 1 (a — l) 1-1 /". Substituting Inequality ([3|) as 

well as the rates of change dG ™v(J{t)) ^ d G (J (t)) an( j dG 1 (j(t)) ^ e runn i n g condition, 
we can see that in order to satisfy it for all values of xt, the multipliers c\ and C2 can be 
set to ci = maxj^^p-, A a aHp} and C2 = 2a ■ (2Hp) 1 ^ a . Since a can be considered as a 
constant, and it is well-known that Hp = O(lnP), the theorem is proved. □ 

Lemma 4 For any n t > 0, s* > and A > 0, we have that n t ' s* < ^ np ^ l s *\ + 

l-l/q 
A 1 /(a-i)(/f p .p)V« n *- 
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Proof. The lemma is a direct result of Young's Inequality [T7], which is stated formally as 
follows. If / is a continuous and strictly increasing function on [0, c] with c > 0, /(0) = 0, 
a G [0, c] and 6 € [0, /(c)], then ah < Jq f(x)dx + f~ 1 (x)dx, where f~ l is the inverse 
function of /. By setting f(x) = A (Hp • p) 1-1 / x a ~ l , a = s* and b = n\ 1//a , the lemma 
is directly implied. □ 

3.3 Lower Bound of Non-clairvoyant Algorithm 

In this subsection, we prove a lower bound on the competitive ratio of any deterministic 
non-clairvoyant algorithm. In particular, this lower bound matches asymptotically the 
upper bound of N-Equi for batched parallel jobs [27], hence suggests that N-Equi is 
asymptotically optimal in the batched setting. 

Theorem 5 Any deterministic non- clairvoyant algorithm is ^(ln 1 ' P)- competitive with 
respect to the total flow time plus energy, where P is the total number of processors. 

Proof. Consider a job set J of a single job with constant parallelism h and work w, 
where 1 < h < P and w > 0. For any non-clairvoyant algorithm A, we can assume 
without loss of generality that it allocates all P processors to the job with speeds satisfying 
s\ > S2 > ■ ■ ■ > sp > 0, which do not change throughout the execution. Let u = Ylf=i s< j 
denote the power of A at any time. The flow time plus energy of J scheduled by A is 
Ga(J) = (1 + u) — . The optimal offline scheduler, knowing the parallelism h, will 

1/a 



allocate exactly h processors of speed f ^ a _^ h j , thus incurring flow time plus energy of 

G*(J) = ■ The competitive ratio of A is = W 1 *^) . g^. 

G ( 7) 

The adversary will choose parallelism h to maximize this ratio, i.e., to find maxi</ l <p g7Fjj ■ 
while the online algorithm A chooses (s\,--- ,sp) to minimize maxi<fc<p garf^j regard- 
less of the choice of h. According to Lemma [61 maxK^p 1/a is minimized when 

- - E,-=i*i 

— n = v for h = 2, • • • , P. Hence, the best non-clairvoyant algorithm will 

zZj=i s j Ej=i s i 

set Sj = (j 1 " 1 /* - (j - l)i-V-) fli for j = 1, 2, • • • ,P. Since j l ~ l ' a - (j - I) 1 " 1 /" > 



we have Sj > ^/" si. Substituting them into u = Ylj=ii s j) a > we S e ^ s i < — ~ — nz- 

J {pt—l)Hp 

The competitive ratio of any non-clairvoyant algorithm satisfies — — ^ — ^ 1+M ^ > 

^ a ~ 1 ] Jc i — — • ~T7i~Hp a > ^^Hp a . The last inequality is because is minimized when 
u = — ^j. Since Hp = O(lnP), the theorem is proved. □ 

Lemma 6 For any P > 1, a > 1 and b > 0, subject to the condition that ^2 ^ = b 
and si > S2 > • • • > sp > 0, maxi</j<p ^ h 1/a is minimized when (s\, S2, ■ ■ ■ ,sp) satisfy 

Ah : = ~h-i - f° r allh = 2,--- , P. 

Proof. The proof is in Appendix A. □ 



9 



3.4 Semi-clairvoyant Algorithm: U-CEQ 



We now present our semi-clairvoyant scheduling algorithm U-Ceq (Uniform Conservative 
Equi) and analyze its total flow time plus energy. In particular, we show that semi- 
clairvoyance makes a big difference on the performance of an online algorithm by proving 
that U-Ceq achieves 0(l)-competitive. As shown in Algorithm [2l U-Ceq at any time 
t works similarly to N-Equi in terms of processor allocation, except that it never allo- 
cates more processors than a job's instantaneous parallelism h^. Moreover, the speed of all 
processors allocated to a job in U-Ceq is set in a uniform manner. 

Algorithm 2 U-Ceq (at any time t) 
1: allocate a«(t) = mm{hf, P/n t } processors to each active job Ji, 

2: set the speed of all allocated processors to job Jj as Si(t) = I , ) 



Again, we say that active job Ji is satisfied at time t if ai(t) = h\, and that it is 
deprived if aj(t) < h\. We can see that job Ji at time t scheduled by U-Ceq has execution 

rate T^(ai(t)) = "'^^l/L anci consumes power Ui(t) = ^-j-. Therefore, the overall power 
consumption is u t = -£zi- Since there is no energy waste, we will show that this execution 
rates is sufficient to ensure the competitive performance of the U-Ceq algorithm. 

Theorem 7 U-Ceq is 0(1) -competitive with respect to the total flow time plus energy for 
any set of parallel jobs. 

Proof. As with N-Equi, we prove the 0(l)-competitiveness of U-Ceq using amortized 
local competitiveness argument with the same potential function given in Eq. ([I]), but rj 
is now set to 77 = pl ri _ 1 / a and if = . Apparently, the boundary, arrival and 

completion conditions hold regardless of the scheduling algorithm. We need only show that 
the execution of any job set under U-Ceq (UC for short) satisfies the running condition 
dGl,c jf^ + d ^Jp < c\ ■ dG k£ ^ + C2 • dGl ^^ 1 where c\ and C2 are both constants with 
respect to P. 

Following the proof of Theorem [3l we have dGvc j^ t ^ = _2L_ nt) dG i^. = n* + 
EU (-$)", and WW > j^^Ej.ejsit) = &Q--*tW Moreover, the 

rate of change —jp- for the potential function $(t) at time t can be shown to satisfy 

p 



„2 



2(a-l)V« n a^ 3j AV(«-i) 1 (a-l)Vc 

where A = 2 a-1 (a — l) 1-1 /". Substituting these bounds into the running condition, we 
can see that in order to satisfy it for all values of xt, we can set c\ = maxj^y, 2 a a} and 
C2 = 2a, which are both constants in terms of P. Hence, the theorem is proved. □ 

We can see that U-Ceq significantly improves upon any non-clairvoyant algorithm with 
respect to the total flow time plus energy, which is essentially a result of not wasting any 
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energy yet still guaranteeing sufficient execution rates for the jobs. Since we know that 
non-clairvoyant algorithms perform similarly to semi-clairvoyant ones with respect to the 
total flow time alone |12l 11 4j. it reveals the importance of (even partial) clairvoyance when 
energy is also of concern. 

Moreover, since U-Ceq takes advantage of the parallelism information of a job, uniform 
speed scaling is sufficient to ensure its competitiveness. Therefore, compared to the non- 
clairvoyant algorithm N-Equi that requires nonuniform speed scaling, U-Ceq may find 
better feasibility in practice. It is not hard to see, however, that non-uniform speed scaling 
is not beneficial in the semi-clairvoyant setting. Instead, it will degrade the performance, 
since generally less energy will be consumed at the same execution rate with uniform speed. 

4 Makespan Plus Energy 

In this section, we consider the objective of makespan plus energy. In particular, we propose 
a semi-clairvoyant algorithm P-First (Parallel-First) and show that it is 0(ln 1_1 / a P)- 
competitive for any set of batched (Par-Seq)* jobs. We also show that this ratio is 
asymptotically optimal for any semi-clairvoyant algorithm. 

4.1 Performances of the Optimal 

We first show that as far as minimizing makespan plus energy for batched jobs, the optimal 
(online/offline) strategy maintains a constant total power of — ^ at any time. This corre- 
sponds to the power equality property shown in [24J, which applies to any optimal offline 
algorithm for the makespan minimization problem with an energy budget. 

Lemma 8 For any schedule A on a set J of batched jobs, there exists a schedule B that 
executes the same set of jobs with a constant total power of — at any time, and performs 
no worse than A with respect to makespan plus energy, i.e., H-q(J) < Hj±(J). 

Proof. For any schedule A on a set J of batched jobs, consider an interval At during 
which the speeds of all processors, denoted as (s\,S2,--- , sp), remain unchanged. The 
makespan plus energy of A incurred by executing this portion of the job set is given by 
Ha = At{\ + u), where u = Ylf=i s ^ ^ s the P° wer consumption of all processors during At. 
We now construct schedule B in such a way that it executes the same portion of the job set 

I i \ x l a 

by running the j'-th processor at speed k- Sj, where k = I ^ a _^ u J ■ This portion will then 

finish under schedule B in ^ time, and the power consumption at any time during this 
interval is given by -z—r- The makespan plus energy of B incurred by executing the same 
portion of the job set is H-q = ^r(l + ^ry) = ( a _±y-i/ a Atu 1 ^. Since ij^j is minimized 

when u = — ! -r, we have tt^ = ' • ^ttz > 1, i.e., Ha > Hn. Extending the same 

Ol — 1 " tlB Ct u l l a — ' ' — ° 

argument to all such intervals in schedule A proves the lemma. □ 

Compared to total flow time plus energy, where the completion time of each job con- 
tributes to the overall objective function, makespan for a set of jobs is the completion time 
of the last job. In this case, the other jobs only contribute to the energy consumption part 
of the objective, thus can be slowed down to consume less energy and eventually lead to 
better overall performance. Based on this observation as well as the result of Lemma [HJ we 
derive the performance of the optimal offline scheduler for any batched (Par-Seq)* job set 
in the following lemma. 
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Lemma 9 The optimal makespan plus energy of any batched set J of (Par-Seq)* jobs 
satisfies H*(J) > {a _^ 1/a ■ nu«{ ^ff , (£? =1 « a ) lA *}- 

Proof. Given any job Jj E JT, define J^p to be a job with a single fully-parallel phase of 
the same work as Jj, and define Jj 5 to be a job with a single sequential phase of the same 
span as Jj. Moreover, we define job set Jp to be Jp = {J-i^p : Ji 6 J} and define Js to be 
Js = {Ji,s '■ Ji € J}- Clearly, the optimal makespan plus energy for J P and Js will be no 
worse than that for the original job set J~, i.e., H*(J) > H*(J P ) and H*(J) > H*(J S ), 
since the optimal schedule for J is a valid schedule for J P and Jg. 

For job set Jp, the optimal scheduler can execute the jobs in any order since all jobs 
are fully-parallel in this case. Moreover, by the convexity of the power function, all P 
processors are run with constant speed s. According to LemmaEl we have Ps a = ^ry , hence 

(\ l/a y^n W (J.) 

(a~i)P ) ' m &kespan plus energy is therefore H*(J P ) = =1 Ps (1 + Ps a ) = 

(Q-l) 1 - 1 /" pl-l/a ' 

For job set Js, allowing the optimal scheduler to have at least max{n, P} processors 
can only improve its performance. In this case, the optimal will execute each job on a 
single processor with constant speed. Moreover, all jobs are completed simultaneously, 

since otherwise jobs completed earlier can be slowed down to save energy without affecting 

•(Ji) = Ki£ 

SI S2 



makespan. Let Si denote the speed by the optimal for job J^s, so 



Stj 



-, and Ya=i s f = according to Lemma Therefore, the speeds satisfy Sj 



(Q _ 1 1 ) i/c, • 7^ l( ^ )a y/ a for i = 1, 2, • • • ,n. The makespan plus energy is H*(J S ) = + 

^ (E?=i sf)= (a .f 1/a (E?=i i{^T) 1/a . □ 

4.2 Semi-clairvoyant Algorithm: P-FIRST 

We now present a semi-clairvoyant algorithm P-First (Parallel-First) for any batched set 
J of (Par-Seq)* jobs. Basically, P-First will first execute the fully-parallel phases of 
any job whenever possible, and then executes the sequential phases of all jobs at the same 
rate. Specifically, at any time t when there are tit active jobs, P-First works as shown in 
Algorithm [3j 

Algorithm 3 P-First 
1: if there is at least one active job in fully-parallel phase at any time t then 

2: execute any such job on P processors; each processor runs at speed , _, 



3: else 

4: execute all active jobs on P' = min{nt,P} processors by equally sharing the proces- 

( 1 \ x l a 

sors among the jobs; each processor runs at speed 1 ( Q _i)p/ 
5: end if 



As we can see, P-First ensures that the overall energy consumption E{J) and the 
makespan M{J) of job set J satisfies E(J) = ■^jM(J'), since at any time t, the total 

power is given by ut = ^r, and E(J) = J'q 1 ^ u t dt. The makespan plus energy of the job 
set under P-First thus satisfies H{J) = E(J) + M(J) = ^zfM(J), and its performance 
against the optimal is shown in the following theorem. 
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Theorem 10 P-First is Ofin 1 P) -competitive with respect to makespan plus energy 
for any set of batched (Par-Seq)* jobs, where P is the total number of processors. 

Proof. Since the makespan plus energy of job set J scheduled by P-First satisfies 
H(J) = — ^fM(J~), we will mainly focus on the makespan M(J). We bound separately 
the time M'( J) when all P processors are utilized and the time M"{J) when less than P 
processors are utilized. Obviously, we have M(J) = M'(J) + M"{J). 

According to P-First, the execution rate when all P processors are utilized is given by 
• The total work completed in this case is upper bounded by YH=i w (Ji)- Hence, 

we have M'{J) < (a — l) 1 /" Spi^^j . "We now bound M"(J) when less than P processors 
are used, which only occurs while P-First executes sequential phases. Since all jobs are 
batch released, the number of active jobs monotonically decreases. Let T denote the first 
time when the number of active jobs drops below P, and let m = ut- Therefore, we have 
m < P. For each of the m active job Jj at time T, let l{ denote the remaining span of the 
job. Rename the jobs such that l\ < I2 < • • • < l m - Since P-First executes the sequential 
phases of all jobs at the same speed, the sequential phases of the m jobs will complete 
exactly in the above order. Define Tq = 0, then we have M"(J) = Y%Li 7 ~ 7175 = 

V (a-l)(m-i+l) ) 

(a— l) 1 /" {( m ~ i + l) 1 ^" — (jn — i) 1 ^) 1%. For convenience, define Cj = (m— i+l) l / a — 

(m — i) 1 ^ for 1 < i < m, and we can get Cj < ^ m _ i+ \y-i/ a ■ Let R = Y^iLi If > an d subject to 

1 

this condition and the ordering of li, YliLi c i'h is maximized when ^ = R l / a -— ^ — — 1/o 



Hence, we have M"(J) < {a-l) l / a R l / a c"" 1 ) 1 < (a-l) 1 l a R l / a Hl l 1/a , where 



H m = 1 + 1/2 + • • • + \ jm denotes the m-th harmonic number. 

The makespan plus energy of the job set scheduled under P-First thus satisfies H{J) < 

E ai" ( a J<) + R 1/a H^ 1/a ) . Since it is obvious that £? =1 / ( J *)° ^ TZi ~ l t = R, 



a 

(a-l) 1 - 1 /^ ^ pi 

comparing the performance of P-First with that of the optimal in Lemma [H we have 
H(J) < (1 + H^n 1/a ) ■ H*(J) = Oiln 1 " 1 ^ P) ■ H*(J), as m < P and it is well-known that 
H m = 0{\nm). □ 

From the proof of Theorem [TOJ we can see that the competitive ratio of P-First is 
dominated by the execution of sequential phases of the (Par-Seq)* jobs. Without knowing 
the jobs' future work, the best strategy for any online algorithm does seem to execute their 
sequential phases at the same rate. In the following theorem, we confirm the intuition 
by proving a matching lower bound for any semi-clairvoyant algorithm using sequential 
jobs only. This result suggests that P- First is asymptotically optimal with respect to the 
makespan plus energy. 

Theorem 11 Any semi-clairvoyant algorithm is il(m 1_1 /° P) -competitive with respect to 
makespan plus energy, where P is the total number of processors. 

Proof. Consider a batched set J of P sequential jobs, where the i-th job has span 
l(Ji) = ( P _^ * i/ a . Since the number of jobs in this case is the same as the number of pro- 
cessors, any reasonable algorithm will assign one job to one processor. From Lemma [SJ the 
optimal offline algorithm has makespan plus energy H*(J) = / a _ 1 u-i/ n Hp • ■> where Hp is 
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the P-th harmonic number. We will show that P-FlRST performs no worse than any semi- 
clairvoyant algorithm A. From the proof of Theorem 1 101 we can get Hpp(J') = A-M(J) = 

( Q _l) Q i-i/ Q Ya=1 {( P ~ 1 + l ) l,a ~ ( P ~ l ) l,a ) l ( J i) > Ya=1 a (p_!+ijl-i/« = (a-l) 1 !- 

Comparing the performances of P-FlRST and the optimal proves the theorem. 

To show Hpp(J') < Hj±{J), we construct schedules from A to P-First in three steps 
without increasing the total cost. For the schedule produced by A, the adversary always 
assigns the i-th job to the processor that first completes ^p_ i ^ 1 ^i/ a amount of work with 
ties broken arbitrarily. For convenience, we let the i-th job assigned to the z-th processor. 
First, we construct schedule A' from A by executing each job Jj with constant speed s[ 
derived by taking the average speed of processor i in A. Based on the convexity of the 
power function, the completion time of each job remains the same in A' but the energy 
may be reduced. Thus, we have H^i{J) < H^(J). According to the adversarial strategy, 
the processor speeds in A' satisfy s[ > s' 2 > • • • > s' P . We then construct schedule A" 
by executing each job Ji with speed s' P throughout its execution. Since we also have 
1{J\) < l(J2) < • • • < l(Jp)i the makespan in A" is still determined by job Jp and is the 
same as that in A', but the energy may be reduced by slowing down other jobs. Thus, 
we have H^n(J) < H^i(J). Note that the speeds of all processors are the same in A" 
now. According to Lemma [8j we can construct schedule B from A" such that it consumes 
constant total power of at any time and H-q{J) < H^n^J). By observing that B is 
identical to P-First, the proof is complete. □ 



5 Discussions 

In this paper, we assumed a parallel job model where each phase of a job can only take a 
linear speedup function. However, the model used in [TJJ [13j [151 E] does not restrict the 
speedup function to be strictly linear; instead, they assumed a more general model, where 
each phase can have a sub-linear and non- decreasing speedup. How to devise a similar 
general model that is compatible with the nonuniform speed scaling policy and the semi- 
clairvoyant processor allocation policy is an interesting problem to consider. In addition, 
compared to the result of N-Equi with respect to the total flow time plus energy, Chan, 
Edmonds and Pruhs [9j showed that MultiLaps achieves the same asymptotic result of 
0(log ^-competitiveness as well as the same lower bound of f^log 1 /" P)-competitiveness. 
However, their results assumed a different execution model than ours. It is interesting 
that N-Equi and MultiLaps achieve identical asymptotic results in these scenarios under 
two different execution models, and it should be useful to further illuminate the relation 
between the two models and identify more fundamental issues in multiprocessor speed 
scaling. Moreover, it is also desirable to obtain tighter upper or lower bounds for arbitrarily 
released jobs under either execution model. 

For the objective of makespan plus energy, which is considered for the first time in the 
literature, we have only studied the performance of semi-clairvoyant algorithms on (Par- 
Seq)* jobs. How to deal with jobs with arbitrary parallelism profile and what is the perfor- 
mance in the non-clairvoyant setting remain interesting problems to consider. In particular, 
comparing the known performance ratios of semi-clairvoyant and non-clairvoyant algorithms 
with respect to both objective functions, we conjecture that minimizing makespan plus en- 
ergy is inherently more difficult than minimizing total flow time plus energy, hence is likely 
to incur a much larger lower bound in the non-clairvoyant setting. The intuition is that 
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a non-clairvoyant algorithm for makespan plus energy can potentially make mistakes not 
only in speed assignment, but also in processor allocation for the jobs. The former mistake 
leads to bad performance since jobs that complete early can in fact be slowed down to save 
energy, and this contributes to the lower bound of semi-clairvoyant algorithms shown in 
this paper. The situation may deteriorate further in the non-clairvoyant setting as more 
energy will be wasted or slower execution rate will result if a wrong number of processors 
is also allocated to a job. 
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Appendix A. Proof of Lemma [6] 

Lemma 6 For any P > 1, a > 1 and b > 0, subject to the condition that ^ ^ s a = b 
and s\ > S2 > ■ ■ ■ > sp > 0, maxi<^<p ^ h 1 is minimized when (s 1; S2, ■ ■ ■ ,sp) satisfy 

£s = for allh = 2,--- , P. 

Proof. To prove this lemma, we transform the stated problem into a convex optimization 
problem. We then show that our proposed solution satisfies the KKT condition, which 
is known to be a sufficient condition for the optimality of convex minimization problems. 
This then leads to the proof of the lemma. First, by introducing a variable y, the original 
optimization problem can be transformed into the following minimization problem: 

minimize y 
P 

subject to ^2 s< j = (3) 

3=1 

Sj > Sj+i for j = 1, ••• ,P-1 

h l-l/a 

V > =^ for h = 1, • • ■ , P 

However, the above minimization problem is not convex because its equality constraint 
(Equation [3]) is not linear. Substituting Zj = s", we transform it into a convex optimization 
problem as follows. 

minimize y 
p 

subject to z j = (4) 

3=1 

Z j+1 -Z j <OfQTj = l,-..,P-l (5) 

Ul-l/a 

y < for h = !,••• ,P (6) 



T h z 

^3=1 Z j 

For this minimization problem, the objective function and the only equality constraint 
(Eq. P| ) are linear, the inequality constraints (Inequalities ([5]) and Inequalities ([6])) are 
convex. Note that Inequalities © are convex because of the fact that 1/ f{x) is a convex 
function if f(x) is a positive concave function, and Ylj=i * s concave since z l J a is 
concave for a > 1. We have now transformed our min-max optimization problem into a 
convex minimization problem. We will prove that our proposed solution (y*,z*,--- ,z P ), 
which has the form 

ul-l/a 

y* = — z for h = 1, • • • , P, (7) 
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is the optimal solution by showing that it satisfies the KKT condition. Let Xj = j l ~ l ^ a — 
(j — 1) 1_1 / Q for j = 1, ■ ■ ■ ,P and apparently we have Xj > Xj + \. From Eq. (J7|) and equality 
constraint (Eq. (HJ)), we can get z* = b- — ^ — - and therefore z* > z* +l for j = 1, • • • , P — 1. 

To prove (y*,z^, ■ ■ ■ ,zp) satisfies the KKT condition, we need to show that it satisfies 
primal feasibility, dual feasibility, complementary slackness, and stationarity. It is not hard 
to see that the proposed solution satisfies the primal feasibility at Eq. ([!]), Inequalities ([5]) 
and Inequalities ©. Let us now associate multipliers with constraints: 

p 

A : J> = 6 

i=i 

Wj : Zj + i — Zj < for j = 1, P — 1 
h l-l/a 

Uh : — ■ —, V < for h = 1, .... P 

ra l/a y — ' ' 

Since we have Zj > z^ +1 for j = 1, • • • , P — 1, to satisfy complementary slackness, we 
get Wj = for j = 1, • • • ,P — 1. Now we need to show that there exists A and \ih > such 
that dual feasibility and stationarity are satisfied. To derive stationarity condition, let us 
look at the Lagrangian function: 

p / /jl-l/a 

L(y, Zj , A, fi h ) = y + ^ fi h [ — ^ - y ) + A I ^ z 3 b 

h=i \l^j=\ z j ) \j=i 

Taking derivative of the Lagrangian function with respect to y and Zj, and substituting 
(y*, z*, z*p) into it, we get the following set of stationarity conditions: 

p 

h=l 

(y*) 2 (r /'/■ 



hr a =Afori = l,...,P. (9) 



\h=3 



vl-l/a / \l-l/o 



Solving Eq. ([9]), we have \ih = ■ A, where = j—^ , for each 

h = 1, • • • , P, and z* p+l is defined to be 0. According to the values of (y*, 2*, • • • , zp), we 
know that y* > 0, > and z* h > Therefore, we have Ch > for h = 1,...,P. 
Substituting fx^ = ■ A into Eq. ([5]), we get A = > 0, which implies that /ih > for 

all h = 1, • • • , P. Thus, we have shown that the dual feasibility is satisfied. Moreover, there 
exists A and fih that make our proposed solution {y*,z\, ■ ■ ■ ,z* p ) satisfy the stationarity, 
hence the KKT condition. Therefore, it is the optimal solution for the convex minimization 
problem, and the corresponding speed assignment s* = is optimal for the original 

optimization problem. □ 
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